verifiably safe exploration
Verifiably Safe Exploration for End-to-End Reinforcement Learning
Hunt, Nathan, Fulton, Nathan, Magliacane, Sara, Hoang, Nghia, Das, Subhro, Solar-Lezama, Armando
Deep reinforcement learning algorithms (Sutton & Barto, 1998) are effective at learning, often from raw sensor inputs, control policies that optimize for a quantitative reward signal. Learning these policies can require experiencing millions of unsafe actions. Even if a safe policy is finally learned - which will happen only if the reward signal reflects all relevant safety priorities - providing a purely statistical guarantee that the optimal policy is safe requires an unrealistic amount of training data (Kalra & Paddock, 2016). The difficulty of establishing the safety of these algorithms makes it difficult to justify the use of reinforcement learning in safety-critical domains where industry standards demand strong evidence of safety prior to deployment (ISO-26262, 2011). Formal verification provides a rigorous way of establishing safety for traditional control systems (Clarke et al., 2018). The problem of providing formal guarantees in RL is called formally constrained reinforcement learning (FCRL).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report (0.64)
- Instructional Material > Course Syllabus & Notes (0.46)